Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries
نویسندگان
چکیده
In this paper we look at a combination of bulk-compression, partial query processing and skipping for document-ordered inverted indexes. We propose a new inverted index organization, and provide an updated version of the MaxScore method by Turtle and Flood and a skipping-adapted version of the space-limited adaptive pruning method by Lester et al. Both our methods significantly reduce the number of processed elements and reduce the average query latency by more than three times. Our experiments with a real implementation and a large document collection are valuable for a further research within inverted index skipping and query processing optimizations.
منابع مشابه
Ranked Document Retrieval in (Almost) No Space
Ranked document retrieval is a fundamental task in search engines. Such queries are solved with inverted indexes that require additional 45%-80% of the compressed text space, and take tens to hundreds of microseconds per query. In this paper we show how ranked document retrieval queries can be solved within tens of milliseconds using essentially no extra space over an in-memory compressed repre...
متن کاملUsing Bitmap Indexing Technology for Combined Numerical and Text Queries
In this paper, we describe a strategy of using compressed bitmap indices to speed up queries on both numerical data and text documents. By using an efficient compression algorithm, these compressed bitmap indices are compact even for indices with millions of distinct terms. Moreover, bitmap indices can be used very efficiently to answer Boolean queries over text documents involving multiple que...
متن کاملWORLDCOMP'12 Typing Instructions for Preparation of Final Camera-ready Papers
In a text database, a set of documents is maintained. To enquiry such a database, two kinds of queries are quite often used. One is the so-called conjunctive query, represented by a set of terms connected by conjunction (); and the other is the disjunctive query, which is also a set of terms, but connected by disjunction (). In this paper, we discuss an efficient and effective index mechanism...
متن کاملSimon Jonassen Efficient Query Processing in
Web search engines have to deal with a rapidly increasing amount of information, high query loads and tight performance constraints. The success of a search engine depends on the speed with which it answers queries (efficiency) and the quality of its answers (effectiveness). These two metrics have a large impact on the operational costs of the search engine and the overall user satisfaction, wh...
متن کاملCompressed Inverted Indexes for In-Memory Search Engines
We present the algorithmic core of a full text data base that allows fast Boolean queries, phrase queries, and document reporting using less space than the input text. The system uses a carefully choreographed combination of classical data compression techniques and inverted index based search data structures. It outperforms suffix array based techniques for all the above operations for real wo...
متن کامل